841 research outputs found
clValid: An R Package for Cluster Validation
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.
Computational biology touches all bases
A report of the 6th Annual Rocky Mountain Bioinformatics Conference, Aspen, USA, 4-7 December 2008
RankAggreg, an R package for weighted rank aggregation
<p>Abstract</p> <p>Background</p> <p>Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.</p> <p>Results</p> <p>The <it>RankAggreg </it>package provides two methods for combining the ordered lists: the Cross-Entropy method and the Genetic Algorithm. Two examples of rank aggregation using the package are given in the manuscript: one in the context of clustering based on gene expression, and the other one in the context of meta-analysis of prostate cancer microarray experiments.</p> <p>Conclusion</p> <p>The two examples described in the manuscript clearly show the utility of the <it>RankAggreg </it>package in the current bioinformatics context where ordered lists are routinely produced as a result of modern high-throughput technologies.</p
SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data
Gene expression data provide an abundant resource for inferring connections in gene regulatory networks. While methodologies developed for this task have shown success, a challenge remains in comparing the performance among methods. Gold-standard datasets are scarce and limited in use. And while tools for simulating expression data are available, they are not designed to resemble the data obtained from RNA-seq experiments. SeqNet is an R package that provides tools for generating a rich variety of gene network structures and simulating RNA-seq data from them. This produces in silico RNA-seq data for benchmarking and assessing gene network inference methods. The package is available from the Comprehensive R Archive Network at https://CRAN.R-project.org/package= SeqNet and on GitHub at https://github.com/tgrimes/SeqNet
Differential Co-Abundance Network Analyses for Microbiome Data Adjusted for Clinical Covariates Using Jackknife Pseudo-Values
A recent breakthrough in differential network (DN) analysis of microbiome
data has been realized with the advent of next-generation sequencing
technologies. The DN analysis disentangles the microbial co-abundance among
taxa by comparing the network properties between two or more graphs under
different biological conditions. However, the existing methods to the DN
analysis for microbiome data do not adjust for other clinical differences
between subjects. We propose a Statistical Approach via Pseudo-value
Information and Estimation for Differential Network Analysis (SOHPIE-DNA) that
incorporates additional covariates such as continuous age and categorical BMI.
SOHPIE-DNA is a regression technique adopting jackknife pseudo-values that can
be implemented readily for the analysis. We demonstrate through simulations
that SOHPIE-DNA consistently reaches higher recall and F1-score, while
maintaining similar precision and accuracy to existing methods (NetCoMi and
MDiNE). Lastly, we apply SOHPIE-DNA on two real datasets from the American Gut
Project and the Diet Exchange Study to showcase the utility. The analysis of
the Diet Exchange Study is to showcase that SOHPIE-DNA can also be used to
incorporate the temporal change of connectivity of taxa with the inclusion of
additional covariates. As a result, our method has found taxa that are related
to the prevention of intestinal inflammation and severity of fatigue in
advanced metastatic cancer patients.Comment: 23 pages, 2 figures, 4 table
clValid: An R Package for Cluster Validation
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneously evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results
Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data
Informative cluster size (ICS) arises in situations with clustered data where
a latent relationship exists between the number of participants in a cluster
and the outcome measures. Although this phenomenon has been sporadically
reported in statistical literature for nearly two decades now, further
exploration is needed in certain statistical methodologies to avoid potentially
misleading inferences. For inference about population quantities without
covariates, inverse cluster size reweightings are often employed to adjust for
ICS. Further, to study the effect of covariates on disease progression
described by a multistate model, the pseudo-value regression technique has
gained popularity in time-to-event data analysis. We seek to answer the
question: "How to apply pseudo-value regression to clustered time-to-event data
when cluster size is informative?" ICS adjustment by the reweighting method can
be performed in two steps; estimation of marginal functions of the multistate
model and fitting the estimating equations based on pseudo-value responses,
leading to four possible strategies. We present theoretical arguments and
thorough simulation experiments to ascertain the correct strategy for adjusting
for ICS. A further extension of our methodology is implemented to include
informativeness induced by the intra-cluster group size. We demonstrate the
methods in two real-world applications: (i) to determine predictors of tooth
survival in a periodontal study, and (ii) to identify indicators of ambulatory
recovery in spinal cord injury patients who participated in locomotor-training
rehabilitation.Comment: 22 pages, 4 figures, 4 table
- β¦